Exploitation vs. exploration: choosing a supplier in an environment of incomplete information

نویسندگان

  • Rina Azoulay-Schwartz
  • Sarit Kraus
  • Jonathan Wilkenfeld
چکیده

An agent operating in the real world must often choose between maximizing its expected utility according to its current knowledge about the world, and trying to learn more about the world, since this may improve its future gains. This problem is known as the trade-oo between exploitation and exploration. In this research, we consider this problem in the context of electronic commerce. An agent intends to buy a particular product (goods or service). There are several potential suppliers of this product, but they diier in their quality, and in the price charged. The buyer cannot observe the average quality of each product, but he has some knowledge about the quality of previous goods purchased from the suppliers. On the one hand, the buyer is motivated to buy the goods from the supplier with the highest expected product quality, deducting the product price. However, when buying from a lesser known supplier, the buyer can learn about its quality and this can help him in the future, when he will purchase more products of this type. We show the similarity of the suppliers problem to the k-armed bandit problem, and we suggest solving the suppliers problem by evaluating Gittins indices and choosing the supplier with the optimal index. We demonstrate how Gittins indices are calculated in real world situations, where deals of diierent magnitudes may exist, and where product prices may vary. Finally, we consider the existence of suppliers with no history, and suggest how the original Gittins indices can be adapted in order to consider this extension.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards a Theory of Balancing Exploration and Exploitation in Probabilistic Environments

Learning to make good choices in a probabilistic environment requires that the Decision Maker resolves the tension between exploration (learning about all available options) and exploitation (consistently choosing the best option in order to maximize rewards). We present a mathematical learning model that makes selections in a repeated-choice probabilistic task based on the expected payoff asso...

متن کامل

An Improved Bat Algorithm with Grey Wolf Optimizer for Solving Continuous Optimization Problems

Metaheuristic algorithms are used to solve NP-hard optimization problems. These algorithms have two main components, i.e. exploration and exploitation, and try to strike a balance between exploration and exploitation to achieve the best possible near-optimal solution. The bat algorithm is one of the metaheuristic algorithms with poor exploration and exploitation. In this paper, exploration and ...

متن کامل

Augmented Downhill Simplex a Modified Heuristic Optimization Method

Augmented Downhill Simplex Method (ADSM) is introduced here, that is a heuristic combination of Downhill Simplex Method (DSM) with Random Search algorithm. In fact, DSM is an interpretable nonlinear local optimization method. However, it is a local exploitation algorithm; so, it can be trapped in a local minimum. In contrast, random search is a global exploration, but less efficient. Here, rand...

متن کامل

Optimal Strategy under Unknown Stochastic Environment | Nonparametric Lob-Pass Problem

The bandit problem consists of two factors, one being exploration or the collection of information on the environment and the other being the exploitation or taking bene t by choosing the optimal action in the uncertain environment. It is necessary to choose only the optimal actions for the exploitation, while the exploration or collection of information requires to take a variety of (non-optim...

متن کامل

Tuning Continual Exploration in Reinforcement Learning

This paper presents a model allowing to tune continual exploration in an optimal way by integrating exploration and exploitation in a common framework. It first quantifies exploration by defining the degree of exploration of a state as the entropy of the probability distribution for choosing an admissible action. Then, the exploration/exploitation tradeoff is formulated as a global optimization...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Decision Support Systems

دوره 38  شماره 

صفحات  -

تاریخ انتشار 2004